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Assistant Commissioner for Patents 
United States Patent and Trademark Office 
Washington, D.C. 20231 



SUBMISSION OF NUCLEOTIDE AND/OR AMINO ACID SEQUENCE DISCLOSURES 
Dear Sirs: 

In connection with the prosecution of the captioned application, which claims priority to 
international application number PCT/GB99/03081 filed September 14, 1999, Applicants submit 
the following items: 

1) An initial computer readable form (CFR) copy of the sequence listing and an initial 
paper copy of the sequence listing, both generated on an IBM computer using Patentln, 
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beyond the content of the application as filed. 

B. Applicants aver that that sequence listings contained on the diskette are identical to 
those contained on the paper copy. 



0V/787228 

JC02 Rec'd PCT/PTO 1 4 MAR 2001 



Applicants simultaneously submit a First Preliminary Amendment instructing that the 
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Amersham Pharmacia Biotech, Inc. 
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IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 

Application of: A.Hine,etal. Group Art Unit: To be assigned 

Serial Number: To be assigned Examiner: To be assigned 

Filing Date: 1 4 March 200 1 

Tm " e: Gene and Protein Libraries and Methods Relating Thereto 



FIRST PRELIMINARY AMENDMENT 

Honorable Assistant Commissioner of Patents 
Box New Patent Application 
Washington, D.C. 20231 



Sir: 

Please consider the following amendments and remarks in connection with the 
prosecution of the captioned application, which claims priority to international application 
number PCT/GB99/03081 filed September 14, 1999. 

IN THE SPECIFICATION 

At the end of the written description, before the claims, please insert the Sequence Listing 
attached hereto. 
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IN THE CLAIMS 

Please amend the claims as follows: 

rCLAIMSl 

WHAT IS CLAIMED IS: 

1 . (amended) A set of libraries of genes which code for proteins which are capable of specific 
binding interactions with a specific binding partner by amino acid residues at at least two 
specified positions including a first specified position and at least one other specified position, 
which set of libraries [consists of] comprises : 

a) 6 to 20 libraries in which each library has a triplet that codes for at least one but 
less than 20 amino acids at said first specified position, and is [randomised]randomized at the or 
each triplet coding for the said at least one other specified position, the arrangement being such 
that interactions of the proteins coded for by the said 6 to 20 libraries with a specific binding 
partner identifies a triplet that codes for an amino acid at the said first specified position that 
takes part in the specific binding interaction, and 

b) 6 to 20 libraries in each of which libraries said first specified position is 
[randomisedl randomized and a different one of said at least one other specified positions has a 
triplet that codes for at least one but less than 20 amino acids. 

2. (amended) The set of libraries of genes as claimed in claim 1, which set of libraries [consists 
of| comprises : 

a) 12 libraries in which each library has a triplet that codes for one or several but less 
than 20 amino acids at the said first determined position, the triplets being as shown in Table 1 or 
Table 2, and 

b) 12 libraries of corresponding design for each of the said one or more other 
determined positions* 



3. (amended) The set of libraries of genes as claimed in claim 1 [ or claim 2], wherein the genes 
code for zinc fingers. 

4. (amended) The set of libraries of genes as claimed in claim 3, which set [consists 
ofl comprises 36 libraries in three groups of 12 libraries which code for amino acids at the -1 and 
+3 and +6 positions respectively. 

5. (amended) The set of libraries of genes as claimed in claim 3[ or claim 4], wherein each gene 
codes for a protein comprising f3 "[ three zinc fingers. 

6. (amended) The set of libraries of genes as claimed in claim 5, wherein each gene codes for a 
protein having the sequence (SEQ ID NO: 2) 

TGEKPYKCPECGKSFSXKSXLVXHQRTH 
TGEKPYKCPECGKSFSXKSXLVXHQRTH 
TGEKPYKCPECGKSFSXKSXLVXHQRTH 
where X is any amino acid.. 

7. (amended) A set of libraries of proteins, which proteins are capable of specific binding 
interactions with a specified binding partner by amino acid residues at at least one specified 
position including a first specified position and at least one other specified position, which set of 
libraries [consists ofl comprises : 

a) 6 to 20 libraries in which each library has at least one but less than 20 amino acid 
residues at the said first specified position and is [randomised]randomized at the said at least one 
other determined position, the arrangement being such that interaction of the 6 to 20 libraries 
with a specific binding partner identifies an amino acid residue at the said first specified position 
that takes part in the specific binding interaction, and 

b) 6 to 20 libraries in each of which libraries said first specified position is 
[randomised] randomized and a different amino acid is present at at least one other specified 
position. 



8. (amended) The set of libraries of proteins as claimed in claim 7, which set of libraries 
[consists of| comprises: 

a) 20 libraries in which each library has one specified amino acid residue at the said 
first determined position and is [randomisedl randomized at the said one or more other 
determined positions, and 

b) 20 libraries of corresponding design for each of the said one or more other 
determined positions. 

9. (amended) The set of libraries or proteins as claimed in claim 7[ or claim 8], wherein the 
proteins are zinc fingers. 

10. (amended) The set of libraries of proteins as claimed in claim 7 ? which set [consists 
of| comprises 60 libraries in three groups of 20 libraries with specified amino acids at the -1 and 
+3 and +6 positions respectively. 

11. (amended) The set of libraries of proteins as claimed in claim 9[ or claim 10], wherein each 
protein comprises three zinc fingers. 

12. (amended) The set of libraries of proteins as claimed in claim 11, wherein each protein has 
the sequence (SEQ ID NO: 2) 

TGEKPYKCPECGKSFSXKSXLVXHQRTH 
TGEKPYKCPECGKSFSXKSXLVXHQRTH 
TGEKPYKCPECGKSFSXKSXLVXHQRTH 
where X is any amino acid. 

13. (amended) A set of libraries of genes which code for the set of libraries of proteins defined in 
[any one of claims 7 to 12]claim_7. 

14. (amended) A method of identifying a protein which interacts with a specific binding partner, 
which method comprises providing a set of libraries of proteins as defined in [any one of claims 



7 to 121 claim 7 , incubating the specific binding partner with each library of the set, observing 
specific binding interactions with certain libraries of the set, and using the observations to 
identify a protein which interacts with the specific binding partner. 

19. (amended) The method as claimed in claim 18, wherein the sets of libraries of proteins are 
rimmobilisedl immobilized on scintillation proximity assay surfaces and the specific binding 
partner is radiolabeled. 

20. (amended) The method of claim 18[ or claim 19], wherein after incubation the scintillation 
proximity assay surfaces are washed to distinguish stronger specific binding interactions from 
weaker ones. 

23. (amended) The method as claimed in claim 21 J or claim 22] wherein after incubation the 
binding interactions are washed to distinguish stronger specific binding interactions from weaker 
ones. 

24. (amended) A protein having the sequence (SEQ ID NO: 1) 

TGEKPYKCPECGKSFS [.]KK S [+]HL V [$]AH Q RTH 
TGEKPYKCPECGKSFS [JKKS [+]HL V [$]AH Q RTH 
TGEKPYKCPECGKSFS [.]KKS[+]HLV[$]AHQRTH. 

26. (amended) A method of constructing frandomisedl randomized gene libraries in which the 
number of genes is the same as the number of encoded proteins and which contain no 
termination codons at the predetermined positions of [randomisation] randomization , the method 
comprising the steps of: 

a) providing a template oligonucleotide which is fully [randomised] randomized at 
predetermined codon positions; 

b) for each predetermined codon position providing a pool of selection 
oligonucleotides, wherein each member of said pool contains a different codon selected from the 
group [consisting of] comprising: 



AAA, AAC, ACC, AGC, ATG, ATT, CAG, CAT, CCG, CGC, CTG, GAA, 
GAT, GCG, GGC, GTG, TAT, TGG, TGC, TTT. 
at the predetermined codon position; 

c) selecting one or more selection oligonucleotides from each pool in order to 
encode the required gene or library; 

d) allowing the ligated selected oligonucleotides from each pool to 
rhybridisel hybridize with the template oligonucleotide; 

e) forming one or more constructs by ligating the Fhybridised~j hybridized selection 
oligonucleotides together; 

f) removing a region from a gene of interest corresponding to the 
rhybridisedl hybridized product from step e); 

g) forming a gene library or genes by ligating the products from step e) into the said 
gene of interest wherein the said gene of interest is contained within a suitable expression vector. 

27. (amended) A method of producing proteins encoded by the [randomised]randomized gene 
libraries of claim 26 comprising the steps of: 

a) transforming a suitable host cell with the gene or gene library of claim [27]26 
construct; 

b) expressing the genes to form proteins; 

c) purifying the proteins. 



REMARKS 



Claims 1-27 are pending in the instant application. Applicants have amended claims 1, 2, 3, 4, 5, 
6, 7, 8, 9, 10, 1 1, 12, 13, 14, 19, 20, 23, 24, 26, and 27 to more fully conform with U.S. practice 
and to delete multiple dependencies. A copy of the marked up claims showing the amendments, 
as well as a clean copy of the claims encompassing the amendments, is attached hereto. 

Applicants respectfully assert that all amendments are fairly based on the specification, and 
respectfully request their entry. 

Applicants believe that the claims, as amended, are in allowable form, and earnestly solicit the 
allowance of claims 1-27. 



Amersham Pharmacia Biotech, Inc. 
800 Centennial Avenue 
P. O. Box 1327 
Piscataway, NJ 08855-1327 

Tel: (732)457-8423 
Fax: (732) 457-8463 
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Respectfully submitted, 




Royal N. Ronning, Jr. 32,5j 
Attorney for Applicants 



Amended Claims (marked up copy showing amendment(s)) 

TCLAIMS1 

WHAT IS CLAIMED IS: 

1 . (amended) A set of libraries of genes which code for proteins which are capable of specific 
binding interactions with a specific binding partner by amino acid residues at at least two 
specified positions including a first specified position and at least one other specified position, 
which set of libraries [consists of| comprises : 

a) 6 to 20 libraries in which each library has a triplet that codes for at least one but 
less than 20 amino acids at said first specified position, and is [randomised] randomized at the or 
each triplet coding for the said at least one other specified position, the arrangement being such 
that interactions of the proteins coded for by the said 6 to 20 libraries with a specific binding 
partner identifies a triplet that codes for an amino acid at the said first specified position that 
takes part in the specific binding interaction, and 

b) 6 to 20 libraries in each of which libraries said first specified position is 
[randomisedjrandomized and a different one of said at least one other specified positions has a 
triplet that codes for at least one but less than 20 amino acids. 

2. (amended) The set of libraries of genes as claimed in claim 1, which set of libraries [consists 
of| comprises : 

a) 12 libraries in which each library has a triplet that codes for one or several but less 
than 20 amino acids at the said first determined position, the triplets being as shown in Table 1 or 
Table 2, and 

b) 12 libraries of corresponding design for each of the said one or more other 
determined positions. 

3. (amended) The set of libraries of genes as claimed in claim 1 [ or claim 2], wherein the genes 
code for zinc fingers. 



4. (amended) The set of libraries of genes as claimed in claim 3, which set [consists 

of] comprises 36 libraries in three groups of 12 libraries which code for amino acids at the -1 and 
+3 and +6 positions respectively. 

5. (amended) The set of libraries of genes as claimed in claim 3 [ or claim 4], wherein each gene 
codes for a protein comprising [3]three zinc fingers. 

6. (amended) The set of libraries of genes as claimed in claim 5, wherein each gene codes for a 
protein having the sequence (SEQ ID NO: 2) 

TGEKPYKCPECGKSFSXKSXLVXHQRTH 
TGEKPYKCPECGKSFSXKSXLVXHQRTH 
TGEKPYKCPECGKSFSXKSXLVXHQRTH 
where X is any amino acid,, 

7. (amended) A set of libraries of proteins, which proteins are capable of specific binding 
interactions with a specified binding partner by amino acid residues at at least one specified 
position including a first specified position and at least one other specified position, which set of 
libraries [consists of] comprises : 

a) 6 to 20 libraries in which each library has at least one but less than 20 amino acid 
residues at the said first specified position and is [randomised]randomized at the said at least one 
other determined position, the arrangement being such that interaction of the 6 to 20 libraries 
with a specific binding partner identifies an amino acid residue at the said first specified position 
that takes part in the specific binding interaction, and 

b) 6 to 20 libraries in each of which libraries said first specified position is 
rrandomisedl randomized and a different amino acid is present at at least one other specified 
position. 

8. (amended) The set of libraries of proteins as claimed in claim 7, which set of libraries 
[consists ofl comprises: 



a) 20 libraries in which each library has one specified amino acid residue at the said 
first determined position and is [randomisedl randomized at the said one or more other 
determined positions, and 

b) 20 libraries of corresponding design for each of the said one or more other 
determined positions. 

9. (amended) The set of libraries or proteins as claimed in claim 7[ or claim 8], wherein the 
proteins are zinc fingers. 

10. (amended) The set of libraries of proteins as claimed in claim 7, which set [consists 
of| comprises 60 libraries in three groups of 20 libraries with specified amino acids at the -1 and 
+3 and +6 positions respectively. 

11. (amended) The set of libraries of proteins as claimed in claim 9[ or claim 10], wherein each 
protein comprises three zinc fingers. 

12. (amended) The set of libraries of proteins as claimed in claim 11, wherein each protein has 
the sequence (SEQ ID NO: 2) 

TGEKPYKCPECGKSFSXKSXLVXHQRTH 
TGEKPYKCPECGKSFSXKSXLVXHQRTH 
TGEKPYKCPECGKSFSXKSXLVXHQRTH 
where X is any amino acid.. 

13. (amended) A set of libraries of genes which code for the set of libraries of proteins defined in 
[any one of claims 7 to 12~| claim 7 . 

14. (amended) A method of identifying a protein which interacts with a specific binding partner, 
which method comprises providing a set of libraries of proteins as defined in [any one of claims 
7 to 12] claim 7 , incubating the specific binding partner with each library of the set, observing 
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specific binding interactions with certain libraries of the set, and using the observations to 
identify a protein which interacts with the specific binding partner. 

19. (amended) The method as claimed in claim 18, wherein the sets of libraries of proteins are 
rimmobilisedl immobilized on scintillation proximity assay surfaces and the specific binding 
partner is radiolabeled. 

20. (amended) The method of claim 18[ or claim 19], wherein after incubation the scintillation 
proximity assay surfaces are washed to distinguish stronger specific binding interactions from 
weaker ones. 

23, (amended) The method as claimed in claim 21 J or claim 22] wherein after incubation the 
binding interactions are washed to distinguish stronger specific binding interactions from weaker 
ones. 

24. (amended) A protein having the sequence (SEQ ID NO: 1) 

TGEKP YKCPECGKSFS [.]KKS [+]HL V [$]AHQRTH 
TGEKPYKCPECGKSFS [.]K K S [+]H L V [$]A H Q R T H 
TGEKP YKCPECGKSFS [.]KKS [+]HL V [$]AHQRTH. 

26. (amended) A method of constructing [randomised]randomized gene libraries in which the 
number of genes is the same as the number of encoded proteins and which contain no 
termination codons at the predetermined positions of rrandomisationl randomization , the method 
comprising the steps of: 

a) providing a template oligonucleotide which is fully [randomised]randomized at 
predetermined codon positions; 

b) for each predetermined codon position providing a pool of selection 
oligonucleotides, wherein each member of said pool contains a different codon selected from the 
group [consisting of! comprising: 

AAA, AAC, ACC, AGC, ATG, ATT, CAG, CAT, CCG, CGC, CTG, GAA, 



GAT, GCG, GGC, GTG, TAT, TGG, TGC, TTT. 
at the predetermined codon position; 

c) selecting one or more selection oligonucleotides from each pool in order to 
encode the required gene or library; 

d) allowing the ligated selected oligonucleotides from each pool to 
[hybridisel hybridize with the template oligonucleotide; 

e) forming one or more constructs by ligating the rhybridisedl hybridized selection 
oligonucleotides together; 

f) removing a region from a gene of interest corresponding to the 
rhybridisedl hybridized product from step e); 

g) forming a gene library or genes by ligating the products from step e) into the said 
gene of interest wherein the said gene of interest is contained within a suitable expression vector. 

27. (amended) A method of producing proteins encoded by the rrandomisedl randomized gene 
libraries of claim 26 comprising the steps of: 

a) transforming a suitable host cell with the gene or gene library of claim [27]26 
construct; 

b) expressing the genes to form proteins; 

c) purifying the proteins. 
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Clean copy of claims showing amendment(s) 



WHAT IS CLAIMED IS: 

1 . (amended) A set of libraries of genes which code for proteins which are capable of specific 
binding interactions with a specific binding partner by amino acid residues at at least two 
specified positions including a first specified position and at least one other specified position, 
which set of libraries comprises: 

a) 6 to 20 libraries in which each library has a triplet that codes for at least one but 
less than 20 amino acids at said first specified position, and is randomized at the or each triplet 
coding for the said at least one other specified position, the arrangement being such that 
interactions of the proteins coded for by the said 6 to 20 libraries with a specific binding partner 
identifies a triplet that codes for an amino acid at the said first specified position that takes part in 
the specific binding interaction, and 

b) 6 to 20 libraries in each of which libraries said first specified position is 
randomized and a different one of said at least one other specified positions has a triplet that 
codes for at least one but less than 20 amino acids. 

2. (amended) The set of libraries of genes as claimed in claim 1, which set of libraries 
comprises: 

a) 12 libraries in which each library has a triplet that codes for one or several but less 
than 20 amino acids at the said first determined position, the triplets being as shown in Table 1 or 
Table 2, and 

b) 12 libraries of corresponding design for each of the said one or more other 
determined positions. 
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3. (amended) The set of libraries of genes as claimed in claim 1, wherein the genes code for 
zinc fingers. 

4. (amended) The set of libraries of genes as claimed in claim 3, which set comprises 36 
libraries in three groups of 12 libraries which code for amino acids at the -1 and +3 and +6 
positions respectively. 

5. (amended) The set of libraries of genes as claimed in claim 3, wherein each gene codes for a 
protein comprising three zinc fingers. 

6. (amended) The set of libraries of genes as claimed in claim 5, wherein each gene codes for a 
protein having the sequence (SEQ ID NO: 2) 

TGEKPYKCPECGKSFSXKSXLVXHQRTH 
TGEKPYKCPECGKSFSXKSXLVXHQRTH 
TGEKPYKCPECGKSFSXKSXLVXHQRTH 
where X is any amino acid. 

7. (amended) A set of libraries of proteins, which proteins are capable of specific binding 
interactions with a specified binding partner by amino acid residues at at least one specified 
position including a first specified position and at least one other specified position, which set of 
libraries comprises: 

a) 6 to 20 libraries in which each library has at least one but less than 20 amino acid 
residues at the said first specified position and is randomized at the said at least one other 
determined position, the arrangement being such that interaction of the 6 to 20 libraries with a 
specific binding partner identifies an amino acid residue at the said first specified position that 
takes part in the specific binding interaction, and 

b) 6 to 20 libraries in each of which libraries said first specified position is 
randomized and a different amino acid is present at at least one other specified position. 
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8. (amended) The set of libraries of proteins as claimed in claim 7, which set of libraries 
comprises: 

a) 20 libraries in which each library has one specified amino acid residue at the said 
first determined position and is randomized at the said one or more other determined positions, 
and 

b) 20 libraries of corresponding design for each of the said one or more other 
determined positions. 

9. (amended) The set of libraries or proteins as claimed in claim 7, wherein the proteins are zinc 
fingers. 

10. (amended) The set of libraries of proteins as claimed in claim 7, which set comprises 60 
libraries in three groups of 20 libraries with specified amino acids at the -1 and +3 and +6 
positions respectively. 

1 1 . (amended) The set of libraries of proteins as claimed in claim 9, wherein each protein 
comprises three zinc fingers. 

12. (amended) The set of libraries of proteins as claimed in claim 1 1 5 wherein each protein has 
the sequence (SEQ ID NO: 2) 

TGEKPYKCPECGKSFSXKSXLVXHQRTH 
TGEKPYKCPECGKSFSXKSXLVXHQRTH 
TGEKPYKCPECGKSFSXKSXLVXHQRTH 
where X is any amino acid. 

13. (amended) A set of libraries of genes which code for the set of libraries of proteins defined in 
claim 7. 

14. (amended) A method of identifying a protein which interacts with a specific binding partner, 
which method comprises providing a set of libraries of proteins as defined in claim 7, incubating 
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the specific binding partner with each library of the set, observing specific binding interactions 
with certain libraries of the set, and using the observations to identify a protein which interacts 
with the specific binding partner. 

15. The method as claimed in claim 14, wherein the specific binding partner is a 
polynucleotide. 

16. The method as claimed in claim 14, wherein the specific binding interactions are 
observed by radiometric or luminescent assay. 

17. The method as claimed in claim 14, wherein the specific binding interactions are 
observed by imaging means. 

18. The method as claimed in claim 14, wherein the specific binding interactions are 
observed by scintillation proximity assay. 

19. (amended) The method as claimed in claim 18, wherein the sets of libraries of proteins are 
immobilized on scintillation proximity assay surfaces and the specific binding partner is 
radiolabeled. 

20. (amended) The method of claim 18, wherein after incubation the scintillation proximity assay 
surfaces are washed to distinguish stronger specific binding interactions from weaker ones. 

21 . The method as claimed in claim 14, wherein the specific binding interactions are 
observed by colorimetric means. 

22. The method as claimed in claim 2 1 , wherein the specific binding partner is biotinylated 
and the specific binding interaction is detected using a signal generating streptavidin conjugate. 



-16- 



23. (amended) The method as claimed in claim 21 , wherein after incubation the binding 
interactions are washed to distinguish stronger specific binding interactions from weaker ones. 

24. (amended) A protein having the sequence (SEQ ID NO: 1) 

TGEKPYKCPECGKSFSKKSHLVAHQRTH 
TGEKPYKCPECGKSFSKKSHLVAHQRTH 
TGEKPYKCPECGKSFSKKSHLVAHQRTH. 

25. A gene which codes for the protein of claim 24. 

26. (amended) A method of constructing randomized gene libraries in which the number of 
genes is the same as the number of encoded proteins and which contain no termination codons at 
the predetermined positions of randomization, the method comprising the steps of; 

a) providing a template oligonucleotide which is fully randomized at predetermined 
codon positions; 

b) for each predetermined codon position providing a pool of selection 
oligonucleotides, wherein each member of said pool contains a different codon selected from the 
group comprising: 

AAA, AAC, ACC, AGC, ATG, ATT, CAG, CAT, CCG, CGC, CTG, GAA, 
GAT, GCG, GGC, GTG, TAT, TGG, TGC, TTT. 
at the predetermined codon position; 

c) selecting one or more selection oligonucleotides from each pool in order to 
encode the required gene or library; 

d) allowing the ligated selected oligonucleotides from each pool to hybridize with 
the template oligonucleotide; 

e) forming one or more constructs by ligating the hybridized selection 
oligonucleotides together; 

f) removing a region from a gene of interest corresponding to the hybridized product 
from step e); 
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g) forming a gene library or genes by ligating the products from step e) into the said 
gene of interest wherein the said gene of interest is contained within a suitable expression vector. 

27. (amended) A method of producing proteins encoded by the randomized gene libraries of 
claim 26 comprising the steps of: 

a) transforming a suitable host cell with the gene or gene library of claim 26 
construct; 

b) expressing the genes to form proteins; 

c) purifying the proteins. 
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GENE AND PROTEIN LIBRATIES AND METHODS RELATING THERETO 
5 Introduction 

Naturally occurring proteins are capable of specific binding 
interactions with other proteins and other molecules. It is well known that 
such proteins can be used as scaffolds and specific amino acid residues 
changed in order to improve binding properties. The changes required can 
io be determined by combinatorial chemistry means. The subject is reviewed 
by Per-Ake Nygren and Mathias Uhlen in Curr. Opin. Struct. Biol. (1997) 7, 
463-469, who list cyclic peptides, immunoglobulin-iike scaffolds, bacterial 
receptors, DNA-binding proteins and protease inhibitors as examples of 
protein scaffolds. The authors conclude that, starting from a suitable 
is protein domain, the use of a combinatorial approach coupled with powerful 
selection or screening strategies can be used to obtain novel proteins 
^ capable of binding a desired target molecule. But the selection or 

%J screening strategies can be difficult. It is this problem that is addressed by 

^ the present invention. 

Q 20 Zinc fingers are examples of protein scaffolds of the kind 

described. Zinc fingers are protein motifs ("mini-domains") which interact 
with double-stranded DNA (some also bind RNA). This interaction is 
dependent on DNA sequence, thus the interaction is termed to be 
sequence-specific. The interaction between the zinc finger and its target 
25 DNA sequence is modular: one zinc finger recognises three bases of DNA. 
Basic rules concerning the interaction were determined early on by 
structural studies (both X-ray crystallography and NMR spectroscopy) of 
zinc finger-DNA complexes. In essence, three residues (amino acids) 
within the zinc finger make base-specific contacts with the DNA. These 
30 three residues differ greatly between different zinc fingers, allowing a 
limited repertoire of different DNA sequences to be recognised. Early 
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mutagenesis experiments determined that if these variable residues are 
changed, a different DNA sequence may be recognised. (A fourth residue 
sometimes contributes to DNA recognition, but this residue is well- 
conserved between different zinc finger proteins). In practice then, the zinc 
5 finger may be viewed as a molecular scaffold ; which orientates the three 
variable residues suitably to enable them to make base-specific contacts 
with the DNA. 

It would be most advantageous to have available a zinc finger 
to bind each trinucleotide (3 bases) of dsDNA. initial attempts to achieve 

ro this goal centred on the structure-based design of novel zinc finger 
proteins. Since 1994 however, several groups have employed 
combinatorial libraries of zinc finger proteins and/or target DNA sequences 
to identify novel zinc fingers which bind to the required DNA sequences 
One such technique has been developed by Choo and Klug 

\5 and is described in WO 96/061 66 and in PNAS, 91,111 63-1 1 1 67 and 

1 1 168-1 1 172 (1994). A single library of zinc finger genes was constructed. 
The library was based on a naturally occurring zinc finger protein, Zif 268, 
which contains three zinc fingers. Only the central finger was randomised 
at seven positions. The library of genes was cloned as a fusion to the f d 

20 phage gene pill. When expressed, a library of bacteriophage resulted, in 
which each bacteriophage displayed a randomised zinc finger protein on its 
surface. In a first stage assay, this library was incubated with a target DNA 
molecule, and individual clones that bound to the target were purified and 
sequenced. In a second stage assay, each of those clones selected was 

25 incubated with a variety of related DNA sequences in order to further 
investigate its binding properties. The technique is subject to some 
inherent disadvantages: 

• Deconvolution is not addressed - purification is inherent in 

the method. The assay results in a pool of a bacteriophage. For 
30 identification purposes, each member of that pool must be cultured 
independently and its DNA sequenced. 



WO 00/15777 



-3- 



PCT/GB99/03081 



• The experimental end point is determined empirically. While 
the assay is in progress, it is impossible to determine the number of 
different phage binding to the target DNA. The end point is therefore 
determined empirically e.g. by 15 washes. Any zinc finger which binds to 

5 the target DNA with sufficient strength to withstand these washes is 

selected, and a pool of zinc fingers results. There is no in-built mechanism 
to determine relative binding strengths of zinc fingers within this selected 
pool; hence the need for a second stage assay. 

• Library size. Constructing a library of the size required is 

io technically difficult - indeed, the authors largest library is 200 times smaller 
than that theoretically required. When expressed therefore, several zinc 
finger proteins may be omitted. 

The present invention addresses these shortcomings. 
Zinc fingers are small protein motifs. They form parts of 
15 larger proteins, but perform their specific function within those proteins. 
Zinc fingers exist in tandem arrays: proteins containing between 2 and 37 
different zinc fingers have been identified. 

In two dimensions, a single zinc finger appears as follows: 



20 




In this diagram, each circle represents a single amino acid 
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residue. 

The zinc finger is so stable that its structure is unaffected by 
the replacement of virtually all residues marked "X" with alanine (Michael 
et a/, PNAS 89, 4796-4800, 1992). Spaced correctly (as above) the 
5 following requirements are all that are necessary for the formation of a zinc 
finger: 

• The 2 cysteine (C) residues 

• The 2 histidine (H) residues 

• The zinc ion (Zn), which is co-ordinated (bound) by the C and 
io H residues 

• Three hydrophobic residues: tyrosine/phenylalanine (Y/F); 
phenylalanine (F4); leucine (Lio)- 

Zinc fingers bind to nucleic acids - either DNA or RNA. In 
nature, zinc fingers usually form part of transcription factors, but in the 

\5 laboratory, it is possible to work with them independently from the rest of 
these proteins. The zinc finger exemplified herein binds to double-stranded 
DNA. One zinc finger binds to three bases of DNA (a trinucleotide). 

Several zinc fingers are usually linked in tandem. Most 
frequently, three zinc fingers interact with successive trinucleotides, which 

20 means that altogether, the three zinc fingers will interact with (recognise) a 
specific 9 base pair (bp) sequence of DNA. Each zinc finger will recognise 
a specific trinucleotide. However, nature has only provided a limited 
repertoire of zinc fingers, so the number of 9 base pair sequences which 
can be recognised is very limited. 

25 The mechanism of DNA recognition is sequence-specific and 

. surprisingly simple. Three residues (amino acids) within the zinc finger 
make contacts (hydrogen bonds or Van de Waal's interactions, for 
example) with three bases of DNA. Most of these contacts are with one 
strand of the DNA. 
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Many experiments have shown that if the three interacting 
residues (here named a ? p and y) are changed, the resulting zinc finger will 
5 recognise a different sequence of DNA. Moreover, if a library of zinc finger 
proteins is made in which ol p and y are randomised, new zinc finger 
proteins may be identified by screening the library with a specific sequence 
of DNA. 

There are 64 possible trinucleotides: 

10 

Number of trinucleotides NNN =4x4x4 =£4 

I . 

(A,C,GorT) 

15 Therefore, 64 different zinc finger proteins, each of which 

binds optimally to one trinucleotide would represent: a complete zinc finger 
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code. A problem (addressed by this invention) is to develop such a code. 

This invention involves applying the principles of 
combinatorial chemistry to the problem. The key to any combinatorial 
system (whether biological, chemical or any other system) is deconvolution: 

5 the identification of an active substituent from within a mixture. The key to 
discovering an optimal zinc finger for each trinucleotide is to identify the 
optimum combinations of residues a, p and y. There will be an optimum 
combination of a, p and yfor each trinucleotide. By using multiple iibraries 
of zinc fingers, with highly controlled overlap between the iibraries, 

io deconvolution can be achieved without purification. 

The Invention 

In one aspect the invention provides a set of libraries of 
genes which code for proteins which are capable of specific binding 
!5 interactions by virtue of amino acid residues at two or more determined 
positions including a first determined position and one or more other 
determined positions, which set of libraries consists of: 

a) 6 to 20 libraries in which each library has a triplet that codes 
for one or several but less than 20 amino acids at the said first determined 

20 position, and is randomised at the triplet or triplets coding for the said one 
or more other determined positions, the arrangement being such that 
interactions of the proteins coded for by the said 6 to 20 iibraries with a 
specific binding partner identifies a triplet that codes for an amino acid at 
the said first determined position that takes part in the specific binding 

25 interaction, and 

b) 6 to 20 libraries of corresponding design for each of the said 
one or more other determined positions. 

In another aspect the invention provides a method of 
constructing randomised gene libraries in which the number of genes is the 
so same as the number of encoded proteins and which contain no termination 
codons at the predetermined positions of randomisation, the method 
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comprising the steps of: 

a) providing a template oligonucleotide which is fully 
randomised at one or more predetermined codon positions; 

b) for each predetermined codon position providing a pool of 
5 selection oligonucleotides, wherein each member of said pool contains a 

different codon selected from the group consisting of 

AAA, AAC, ACC, AGC, ATG ; ATT, CAG, CAT, CCG, CGC, CTG } GAA, 
GAT, GCG, GGC, GTG, TAT, TGG, TGC, TTT. 

10 

at the predetermined codon position; 

c) selecting one or more selection oligonucleotides from each 
pool in order to encode the required gene or library; 

d) allowing the selected selection oligonucleotides from each 
is pool to hybridise with the template oligonucleotide; 

e) forming one or more constructs by ligating the hybridised 
selection oligonucleotides together; 

f) removing a region from a gene of interest corresponding to 
the hybridised product from step e); 

20 g) forming a gene or library of genes by ligating the products 

from step e) into the said gene of interest wherein the said gene of interest 
is contained within a suitable expression vector. A preferred method of 
selecting one or more selection oligonucleotides from each pool in order to 
encode the required gene or library at step c), is to select the selection 

25 oligonucleotides according to randomisation strategy B, described herein. 
A method of producing proteins encoded by these randomised gene 
libraries is also provided by the invention and comprises the steps of: 

a) transforming a suitable host cell with a gene or gene library 

construct; 

30 b) expressing the genes to form proteins; 

c) purifying the proteins. 
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Suitable host cells, gene expression methods and purification protocols for 
carrying out this method are known in the art. 

In another aspect the invention provides a set of libraries of 
proteins, which proteins are capable of specific binding interactions by 
5 virtue of amino acid residues at two or more determined positions including 
a first determined position and one or more other determined positions, 
which set of libraries consists of: 

a) 6 to 20 libraries in which each library has one or several but 

less than 20 amino acid residues at the said first determined position and is 

30 randomised at the said one or more other determined positions, the 
arrangement being such that interaction of the 6 to 20 libraries with a 
specific binding partner identifies an amino acid residue at the said first 
determined position that takes part in the specific binding interaction, and 
5) 6 to 20 libraries of corresponding design for each of the said 

15 one or more other determined positions. 

In another aspect the invention provides a method of 
identifying a protein which interacts with a specific binding partner, which 
method comprises providing a set of libraries of proteins as defined, 
incubating the specific binding partner with each library of the set, 

20 observing specific binding interactions with certain libraries of the set, and 
using the observations to identify a protein which interacts with the specific 
binding partner. Preferably, as discussed in more detail below, this method 
may be performed using radiometric or non-radiometric detection means, 
for example scintillation detection, luminescence, for example fluorescence, 

25 detection, colorimetric detection, or imaging, by methods known in the art. 

A library of compounds (e.g. genes or proteins) consists of a 
plurality of compounds which are ail different but which have some 
characteristic in common. The compounds of the library may be presented 
either separate or together, in solution or solid phase. In a set of libraries, 
30 the compounds of any one library have some characteristic in common but 
which differentiates them from the compound of each other library of the 
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set, 

A specific binding interaction of a protein with another 
molecule (the specific binding partner) is an interaction mediated by a 
specified amino acid residue at one or more usually several positions in the 

5 protein molecule. The specific binding partner is usually though not 
necessarily a polymeric molecule, e.g. a nucleic acid (DNA or RNA) or 
another protein. 

In relation to proteins, the statement that a library is 
randomised at a determined position is herein used to mean that the library 

10 contains a random mixture of all or almost all possible amino acid residues. 
We say "almost all" because there might be a special reason for omitting 
one residue e.g. Cys, or a few amino acid residues. In relation to genes, 
the statement that a triplet is randomised is herein used to indicate a triplet 
NNN (where N is any nucleotide) or a triplet that is capable of coding for all 

15 or almost all the amino acids. 

The term protein is herein used to encompass any chain of 
two or more amino acid residues. 

The term polynucleotide is herein used to encompass any 
chain of three or more nucleotide residues, single-stranded or double- 

20 stranded DNA or RNA. 

The experimental section below describes a set of libraries of 
zinc finger genes which code for a set of libraries of zinc finger proteins, 
which are used to identify specific zinc fingers which interact with specific 
polynucleotides. But the invention is more broadly applicable. It is in 

25 principle possible to make a set of libraries of any protein which undergoes 
a specific binding interaction, using that protein as a scaffold to vary 
specific amino acid residues. It is in principle possible to make a set of 
libraries of genes coding for such a set of protein libraries. And it is 
possible to use such a set of protein libraries to investigate any specific 

30 binding interaction, e.g. where the specific binding partner is a 

polynucleotide or another protein or a different molecule. It may be noted 
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that zinc fingers may be capable of undergoing specific binding 
interactions, not only with polynucleotides, but also with other proteins. 

it is convenient to control the overlap between libraries of a 
set of protein libraries by controlling the DNA sequences of the genes 
5 which code for the proteins. Thus, to make a library of zinc finger proteins, 
a library of zinc finger genes is first made. For convenience in relation to 
what follows we quote the genetic code which relates the identities of 
codons to the amino acids which they specify. 



10 



r 



2nd base 

A 



A C G T 



fA 



] st base < 



Lys 
Asn 
Lys 
Asn 



Gin 

CHis 
Gin 
His 



Glu 
Asp 
Glu 
Asp 



Thr Arg 

Thr Ser 

Thr Arg 

Thr Ser 

Pro Arg 

Pro Arg 

Pro Arg 

Pro Arg 



G 



Ala 
Ala 
Ala 



Gly 
Gly 
Gly 



Ala Gly 



lie 
lie 
Met 
lie 

Leu 
Leu 
Leu 
Leu 

Val 
Val 
Val 
Val 



A 
C 
G 
T 



STOP Ser STOP Leu A 

TTyr Ser Cys Phe ^ 
STOP Ser Trp Leu ^ 



Tyr Ser Cys 



V 3rd base 



Phe 



Thus for example a codon with multiple degeneracy, e.g. 
ANN comprises 16 different triplets and codes for seven different amino 
acids namely Lys, Asn, Thr, Arg, Ser, He and Met. 

35 while it is possible in principle to use as few as six libraries of 

genes to identify a particular amino acid residue, it is in practice convenient 
to use twelve such libraries in groups of four, wherein libraries 1 to 4 
identify the first nucleotide of a triplet, libraries 5 to 8 identify the second 
nucleotide of the triplet, and libraries 9 to 12 identify the third nucleotide of 

20 the triplet which codes for the amino acid. In this arrangement it is 

preferable that only one of libraries 1 to 4 (and correspondingly only one of 
libraries 5 to 8 and only one of libraries 9 to 12) codes for any particular 



WO 00/15777 



-11 - 



PCT/GB99/03081 



amino acid. These considerations give rise to various possible sets of 12 
libraries of which one is shown in the following Table 1 . 

Table 1 



5 
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Amino Acids Specified 


1 


a 


A A c T N 


Lys Asn Thr He Met 






C A cv N 


Gin His Pro Aro 

1 1 i i no t t\j / ii y 


3 


a 


G N N 


Au Asp Ala Giy Vai 


4 


a 


TNN 


Tyr Ser Cys Trp Leu Phe 


5 


a 


NAN 


Lys Asn Gin His Glu Asp Tyr 


6 


a 


N C N 


Thr Pro Ala Ser 


7 


a 


C G T GN 


Arg Gly Cys Trp 


8 


a 


A c T T N 


lie Met Leu Val Phe 


9 


a 


A c G A c T G 


Lys Thr Met Gin Pro Leu Glu Aia Val 


10 


a 


TGG 


Trp 


11 


a 


N A G C 


Asn Ser His Arg Asp Giy Tyr Cys 


12 


I a 


A T TC 


lie Phe 



Note that any given amino acid appears only once in any set 

of 4 libraries. 

Similar randomisation can now be applied to all three 
10 positions: a, p and y of zinc finger proteins, to generate libraries 1-36. In 
libraries 1-12, the randomisation of residue a is controlled (in these 
libraries, residues p and 7 are fully randomised - they are specified by the 
codon NNN). Similarly, libraries 13-24 control the randomisation of position 
p, and libraries 25-36 control the randomisation of residue y). 
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AH 36 gene libraries are expressed to generate zinc finger 
libraries. These zinc finger libraries are then incubated with a 
polynucleotide of interest, in such a way as to identify one library from each 
group of four that binds most strongly to the polynucleotide. For example, 
each library may be placed in an individual well of a microtitre plate and 
there incubated with the same trinucleotide. 

Consider the controlled randomisation of residue a. Because in 
any one group of 4 libraries each amino acid is encoded only once, each amino 
acid, as residue a, will occur in only three of the twelve libraries: 
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Presence / absence of an amino acid at position a within any 
given library is a direct result of the controlled randomisation and the 
genetic code. 

This may now be applied to the assay. Consider that libraries 
1-12 only are screened with the trinucleotide ATG and that in order for a 
zinc finger to bind ATG, residue a must be Lys (lysine). An assay of 
libraries 1-12 is performed: 



l.ihrarv I 2 34 5 67 8 910 1112 ^ 

" Position a 



•oo.otoooiooo 

oooooooooooo 
oooooooooooo 
ooooooo o oooo 
oooooooooooo 
oooooooooooo 
oooooooooooo 
ooooooo o oooo 



A C G T » A C iF G T ' - G G C Fi * ed nucleotide 

^. j^jj Position of fixed nucleotide within codon 



Only libraries 1, 5 and 9 contain lysine as residue a, therefore 
only these libraries can emit light. None of the other libraries can emit 
light, because none of them specify lysine as residue a. However, this is 
not the limit of our knowledge. We know the identity of the fixed nucleotide 
15 within each library, Moreover, we can read this off directly from the 
microtitre plate. In this case, the order of fixed nucleotides is AAG, 

Thus, simply from the unique combination of libraries which 
emit light, we know the genetic code for the amino acid required as residue 
a. In this case, the essential fixed nucleotides are A AG, which specifies 
20 lysine. We have now linked the genetic code directly to the physical 
properties of a protein. 

This principle may be applied to all 36 libraries. In so doing, 
the genetic codes and thus required identities of all three residues a, (3 and 
7 will be determined: 
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This is possible, because in libraries 1-12, residues p and 7 
are fully randomised. Therefore, in each of libraries 1-12 Ser and Arg are 
present as residues [3 and 7 within the mixture. 

Similarly, when controlled randomisation is applied to residue 
5 [3 (libraries 13-24) residues a and 7 are fully randomised and when 
controlled randomisation is applied to residue 7, residues a, p are fully 
randomised. 

By screening the 36 libraries with each of the 64 
trinucleotides, an optimum zinc finger will be found for each trinucleotide. 
10 Thus the result is therefore the solution of the zinc finger code whereby 
DIMA binding proteins may now be designed at will. 

Should more than three libraries within a given set of twelve 
produce a signal, then the plates may be washed to remove signals 
resulting from weak interactions. An end point to the assay has been 
is reached when just three libraries per set of twelve generate a signal. 

The above strategy generates libraries of genes which when 
expressed, yield protein libraries in which two positions are fully 
randomised and one position has controlled randomisation. In practice, 
this leads to libraries with between 400 (e.g. library 1 0) and 3600 (eg. 
20 library 9) constituent proteins. These numbers are calculated as follows; 

Number of library constituents = multiplication of number of possibilities at each 

position of randomisation 

25 eg. library 1 : = position a x position ji x position 7 

5 x 20 x 20 

2000 constituents fproteins^ 



30 



However, these small libraries result from the degeneracy of 
the genetic code. In practice, the gene libraries which encode the proteins, 
randomised as above, will be far larger. For example, again consider 
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library 1 : 

Codon a p T 

Sequence A A c T N NNN NNN 

5 

Numbers 1x3x4 x 4x4x4 x 4x4x4 = 49152 constituents faeries) 

The generation of such libraries should not be problematic 
technically, since libraries far larger than these exist already (eg. Choo and 
10 Klug, 1994, PNAS 91 , 1 1 163-7). However, it may it may prove beneficial to 
reduce the gene library sizes to those of the protein libraries. Potential 
benefits include: 

• greater likelihood of full representation within each library (all 
constituent proteins encoded); 

15 • even representation of each constituent (an equal amount of 

each constituent protein within a given library); 

• consistent optimum codon usage (to maximise expression). 
These attributes are desirable because of the degeneracy of 

the genetic code. Again consider library 1. Within this library, position p is 
20 encoded by NNN. When expressed therefore, residue p is 6 times more 
likely to be serine than it is to be methionine, because serine is encoded six 
times within NNN for each encoding of methionine. 

Such bias within libraries may have an adverse effect on the 
results of the assay. Any detrimental effect is predicted to be minor - it 
25 should occur only if two proteins have similar binding affinities with a given 
DNA sequence. However, such an eventuality is possible: consider that 
two zinc fingers with positions a=Arg, ^=Ser, y=l_ys and a=Arg, p=Met, 
*y=Lys bind similarly to a given sequence of DNA, with a=Arg, p=Met } y=Lys 
being the optimally binding zinc finger protein. During the assay, the 
30 effective concentration of the protein containing serine at position p would 
be greater than that of the protein containing methionine. Thus, the serine- 
containing protein might give a stronger signal even though it is not the 
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optimum zinc finger for that DNA sequence. 

It may therefore be preferred to substitute the codon MAX for 
positions of full randomisation (previously NNN), where MAX is a mixture 
5 containing only the following codons: 

AAA, AAC, ACC, AGC, ATG, ATT, CAG, CAT, CCG, CGC, CTG, GAA, GAT, GCG, GGC, GTG, 
TAT, TGG, TGC, TTT. 

io These codons represents those most favoured by E coli for 

each amino acid (Nakamura et al., (1997), Nucleic Acids Research, 25, 
244-245). 

In order to employ these codons in controlled randomisation, 
a new division of the codons into sets of 12 libraries is required, as outlined 
15 in randomisation strategy B: 
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The changes in controlled randomisation will affect the library 
numbers which produce a signal and therefore the interpretation of the 
assay results. However, the principles of controlled randomisation and the 
mechanism of assay interpretation remain unchanged. Using 
randomisation strategy B, the example illustrated above is reiterated: 
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Randomisation strategy A is in principle, the easier strategy to 
implement technically. However, strategy B is preferred. Gene libraries of 
much smaller size are required. Although construction of these highly- 
controlled libraries is technically demanding, it is much more likely that the 
5 libraries encode all required proteins and moreover that those proteins are 
encoded in similar proportions, so removing potential difficulties in the SPA 
library assays. 

Construction of these gene libraries may be achieved by 
cloning oligonucleotide cassettes between two appropriately positioned 
io restriction sites which flank positions a and y. Construction of the 
oligonucleotide cassettes requires a set of sixty-one oligonucleotides 
comprising one fuliy-randomised "template" oligonucleotide and three pools 
of selection oligonucleotides. The template oligonucleotide is of sequence 

i5 3'- NNN NNN NNN 5' 

where " " represents the invariant DNA and NNN the positions of 

randomisation within the non-coding strand of the gene. The intervening 

sequences " " are conveniently between 3 and 21 bases in length. 

20 The pools of selection oligonucleotides contain twenty 

individual oligonucleotides of sequence 

Lys: 5' AAA 3' 

Asn: 5' AAC 3' 

25 Thr: 5' ACC 3 1 

Ser: 5' AGC 3 1 

Met: 5' ATG 3' 

He; 5 J ATT 3' 

Gin: 5'-— CAG 3' 

30 His; 5' CAT 3' 

Pro: 5' CCG 3' 
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Arg: 


5' 


Leu: 


5' 


Glu: 


5' 


Asp: 


5' 


Ala: 


5' 


Gly: 


5'' 


Val: 


5' 


Tyr: 


5'- 


Trp: 


5'- 


Cys: 


5'- 


Phe: 


5'- 



CGC 3 f 

. CTG- 3 f 

• — GAA- — 3' 

■ GAT— -3 ! 

• GCG- — 3' 

, GGC---3' 

■ GTG- — 3' 

TAT- — 3 1 

— --TGG 3' 

— TGC— 3' 
■ TTT- 3' 



where the sequence " " is of suitable length and base sequence to 

base pair with the non-variant regions of the template and the defined 

is codon corresponds to one of those comprising the "MAX" set of codons 
(defined herein at page 18, line 5). The defined codon corresponds to a 
position of randomisation and must be either at or near to one end of the 
oligonucleotide. A complete selection pool represents a set of twenty such 
oligonucleotides, in order that all codons contained within "MAX" are 

20 represented and all twenty amino acids are encoded. 

The invention enables fully randomised libraries, positionally 
fixed libraries and individual genes to be constructed. Oligonucleotides 
encoding the required amino acid at each position of randomisation would 
be taken from each selection pool, For example, if full randomisation is 

25 required at a given position, then ail 20 selection oligonucleotides would be 
taken. If positional fixing were required, then all oiigonucleotides where the 
"MAX" codon begins with A (for example) would be taken. If a single amino 
acid were required at the position of randomisation, the single selection 
oligonucleotide corresponding to that amino acid would be taken. 
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Construction of a single zinc finger gene encoding a=Lys, p-Ser, 
T=Arg 

The selection oligonucleotides p-Ser and y-Arg are treated 
with T4 polynucleotide kinase and ATP in order to attach 5' phosphate 
5 groups and so enable them to participate in ligation reactions. These two 
oligonucleotides, together with the selection oligonucleotide a-Lys and the 
template oligonucleotide are combined, heated to 90 C and allowed to 
cool slowly to room temperature, in order to allow complementary 
sequences of DNA to base pair as shown below: 



a-Lys - 



p-Ser- 



y-Arg- 



Selection oligonucleotides from pools a, P and y 



A A Afni n » nil i i n t AGO ' i» * " i c=dCGQ: 




iNNN" 



■ NNN* 



■NNN- 



Template (one fully-randomised oligonucleotide) 



3' 




1/2 

restriction 
site 



KEY: 



miinii 



10 



Invariant DNA sequence within pool a 
Invariant DNA sequence within pool (3 
Invariant DNA sequence within pool y 
Invariant DNA sequence of the template oligonucleotide 
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The resulting oligonucleotide cassette is then inserted into the appropriate 
restriction sites in the zinc finger gene, so generating the zinc finger gene 
a=Lys, p=Ser, 7=Arg. None of the other sequences contained in the 
template oligonucleotide are cloned, since only the double stranded DNA 
5 cassette will be ligated into the parental gene. Selection from the template 
oligonucleotide is thus achieved by addition of the three selection 
oligonucleotides. 



Construction of zinc finger library 1 

io The selection oligonucleotides p-MAX and y-MAX (where 

MAX = an entire selection pool) are treated with T4 polynucleotide kinase 
and ATP in order to attach 5' phosphate groups and so enable them to 
participate in ligation reactions. These two oligonucleotide pools, together 
with the selection oligonucleotide a-MIX 1 where MIX 1 is the following 
is mixture of oligonucleotides; 

^ a-Lys: 5' -AAA 3* 

y a-Asn: 5' AAC 3' 

£ a-Thr: 5 5 ACC 3' . 

S3 20 a-Ser: 5' AGC 3' 

a-Met: 5' ATG 3' 

a-ile: 5' ATT 3' 



h0 



and the template oligonucleotide are combined, heated to 90 C and 
25 allowed to cool slowly to room temperature, in order to allow 

complementary sequences of DNA to base pair as above. 

The resulting mixture of oligonucleotide cassettes is then 

inserted into the appropriate restriction sites in the zinc finger gene, so 

generating the zinc finger library 1 . None of the other sequences contained 
30 in the template oligonucleotide are cloned, since only the double stranded 

DNA cassettes will be ligated into the parental gene. Selection from the 
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template oligonucleotide is thus achieved by addition of the three pools of 
selection oligonucleotides. Note that the number of genes exactly matches 
the number of encoded proteins and that no truncated proteins should 
result, since "MAX" contains no termination codons. 

5 

Generalised application to randomised peptides 

The above technique may also be used to generate genes 
encoding fully randomised peptides, without intervening conserved gene 
sequences. Again, the number of genes will exactly match the number of 

o encoded peptides. In the case of a fully randomised peptide library without 
positional fixing, just 21 oligonucleotides are required: a fully-randomised 
template oligonucleotide of the desired length and a set of the twenty 
"MAX" trinucleotides. Annealing between the set of "MAX" trinucleotides 
and the template will generate cassettes encoding all possible peptides, 

5 dependent on complete representation within the template oligonucleotide, 
which will decrease with oligonucleotide length. 

Positionaliy fixed, random peptides may be made similarly, 
although a set of twelve templates will be required for each codon. Here, 
for a given codon, the non-coding template strand will be fixed alternatively 

:o as T, G, C and A at each nucleotide and the "MAX" trinucleotides annealed 
as above. 

a) The above strategies A and B involve designing sets of 

libraries of genes which in turn may be expressed to generate 
corresponding libraries of proteins. 

:5 The method of the invention involves incubating a set of 

libraries of proteins with a specific binding partner, observing specific 
binding interactions with certain libraries of the set, and using the 
observations to identify a protein which interacts with the specific binding 
partner. Although other assay techniques are possible, this method is 

o preferably performed using scintillation proximity assay (SPA) technology. 
Briefly, this technology involves providing a support which comprises a 



WO 00/15777 



PCT/GB99/03081 



- 30 - 

scintiilant which emits light when subjected to electrons (e.g. [3 particles) or 
other forms of radiation resulting from decomposition of a radioisotope. 
The support may be massive, e.g. the base of each well of a microtitre 
plate, or may be particulate. One assay reagent is immobilised on the 

5 support. Another assay reagent is radiolabeled and is partitioned between 
two fractions, one bound to the support and the other free in solution. The 
relative size of the two fractions is arranged to be related to the presence or 
the concentration of an analyte of interest. The radioisotope is chosen 
such that reagent bound to the support causes the scintiilant in the support 

io to emit light, while reagent free in solution does not (on account of the short 
mean free path of the radiation) significantly affect the scintiilant substance. 

Various assay formats are possible. For example, each 
library of a set of libraries can be immobilised in an individual well, either of 
a standard microtitre plate or of a scintiilant containing microtitre plate. A 

15 specific binding partner of the proteins is labelled and introduced into each 
well. Labels can be radiometric, luminescent, for example fluorescent or 
may be enzyme. Where radiometric of luminescent labels are used, a 
specific binding interaction can be investigated in real time. Where enzyme 
labels are used the interaction can be investigated upon the addition of the 

20 appropriate reagents needed to generate a signal. Where several wells 
emit a signal, repeated washing can be used to remove weakly interacting 
species until the specific binding partner remains bound only in a single 
well. This ability to identify a single library (as opposed to a small pool of 
libraries) that bind most strongly to any particular specific binding partner, is 

25 a valuable feature, and an advance on assay techniques used previously 
for similar purposes. 

Alternatively, the specific binding partner can be immobilised 
in each well of the SPA microtitre plate. Each protein library is 
radiolabeled and introduced into a different well of the plate for interaction 

30 with the specific binding partner. Alternative assay formats, in which 

neither the protein library nor its specific binding partner, but rather a third 
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reagent is radiolabeled, are well known in the art. 

Techniques for immobiiising protein or other assay reagents 
on SPA surfaces in forms suitable for taking part in SPA assays, are well 
known in the art. Development of suitable techniques should not amount to 
5 more than the routine optimisation ordinarily required for assays of this 
kind. Detection of interactions by non-radioactive assay and imaging 
techniques such as luminescent, for example fluorescent, detection or 
colorimetric detection of interactions between, for example, biotin linked 
and streptavidin linked partners is also envisaged. 

io Most zinc finger proteins form the DNA recognition module of 

transcription factors, which serve to switch genes on or off. Already, 
several examples exist where novel transcription factors have been 
engineered, by changing their zinc fingers (Choo efa/(1994), Nature 372, 
642-5), Similarly, zinc fingers have been linked to restriction endonuclease 

15 cleavage domains, to generate novel restriction endonucleases (e.g. Kim et 
al (1996), PNAS 93, 1 156-60). The application of zinc fingers is almost 
limitless - when ever a need arises to link something to a specific sequence 
of DNA, it can be met with a series of zinc fingers. However, in order to 
design DNA-binding proteins at will, there must be available one zinc finger 

20 for each trinucleotide. This invention provides enabling technology to 
achieve that object. 

Example 

The example involves a single protein, comprising three zinc 
25 fingers. Controlled randomisation is applied only to the central zinc finger. 
The two outer zinc fingers are present simply to ensure correct registry with 
the target DNA sequence and to increase overall binding strength (Choo 
and Klug, (1994) PNAS 01, 11163-67; Berg (1997) Nature Biotech. 15, 
323). 

30 The work is divided into four stages: gene synthesis, gene 

expression, radiometric and colorimetric assay formats, assay results and 
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proof of principle. 

Gene Synthesis: 

A gene was designed and synthesised to encode the protein 
5 (SEQIDNO: 1) 

T G E K P YK£P.ECGKSFSKKSHLV/iHQRTH 

T G E K P YK£PE£GKSFSKKSHLV*HQRTM 

10 

T G E K P YKCPECGKSFSKKStfLV/tHQRTjd. 
Key: 

X linker residues 
15 X zinc co-ordinating residues 

X DNA-contacting residues (a, p and y) (positions -1 , +3 and +6) 

This protein corresponds to three repeats of Berg's 
20 consensus zinc finger sequence (Krizek et a/., (1991) JACS 1 13, 4518-23), 
with DNA-contacting residues from the first zinc finger of transcription 
factor Sp1 (Berg (1992) PNAS 89, 11109-10; Shi and Berg, (1995) Chem 
& Biol. 2, 83-89). Each zinc finger sequence is preceded by a KruppeKype 
linker peptide (Choo and Kiug (1993) NAR 21, 3341-6). By analogy to 
25 previous precedent (Shi and Berg, 1995), the three repeats of this novel 
zinc finger peptide are expected to bind to the dsDNA sequence 
5'-GGG GGG GGG-3'. 

To maximise gene expression, on converting the sequence 
into DNA, E. coli codon preference was employed (Wada et ai (1992) 
30 NAR20 sup., 2111-8). Wherever possible, first preference codons were 
used. However, in some instances, second preference codons were also 
employed. These limited sequence repetition within the gene, necessary to 
prevent potential intragenic recombination events, which would be 
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deleterious to ensuing experiments. In practice, a maximum repeat length 
of 8 base pairs was mostly achieved. Use of second preference codons 
also allowed the incorporation of restriction enzyme sites within the gene. 
The final gene sequence, restriction sites and codon usage are illustrated 
5 in Figure 1. 

Gene Expression 

In the current assay format, the zinc finger gene is fused to 
the giutathione-S-transferase gene in the vector pGEX2TK (Amersham 

io Pharmacia Biotech). Expression of this construct leads to a 36.5 kD 

protein comprising GST at the amino terminus and the zinc finger protein at 
the carboxyl terminus. Gene expression is performed in £ co//BL21 cells 
according to manufacturer's instructions. The resulting fusion protein is 
then purified using glutathione-Sepharose (Amersham Pharmacia Biotech) 

15 according to manufacturer's instructions. Use of the pGEX2TK vector 
allows for the subsequent radiolabelling of the protein if required. 

Assay formats for assessing zinc finger - DNA interactions 

20 Direct attachment of GST fusion protein to microtitre plates, followed bv 
colorimetric detection of biotinylated DNA (Assay format 1) 

GST or GST ZF protein (4 pmoles per well) was immobilised 
in microtitre wells in carbonate buffer, pH 9.2, for 18 hrs. The plates were 
washed three times in TBS-Tween (0.3% Tween) and then blocked in the 

25 same buffer for 3 hrs. After washing, 2-fold serial dilutions of DNA were 
added to each well. The protein and DNA were incubated together for 2 
hrs at room temperature, and the wells were then washed 3 times in TBS- 
Tween. As negative controls, experiments were performed in the absence 
of DNA, to assess binding of GST / GST ZF proteins by the streptavidin 

30 conjugate. Bound DNA was detected by adding streptavidin / peroxidase 
conjugate, which was removed by 3 washes in TBS. Finally, the conjugate 



WO 00/15777 



-34- 



PCT/GB99/03081 



was detected colorimetrically according to manufacturer's instructions. All 
reactions were performed in duplicate. Figure 1 demonstrates that 
interaction between the zinc finger protein and its target DNA sequence 
may be assessed using this assay format. In figures 1 , 2 and 3, the legend 
5 'bkg' denotes background detection levels. 

Direct attachment of GST fusion protein to microtitre plates, followed by 
scintillation-based detection of radiolabelled DNA (Assay format 2) 

GST or GST ZF protein (4 pmoles per well) was immobilised 

io in microtitre wells in carbonate buffer, pH 9.2, for 18 hrs. The plates were 
washed three times in TBS-Tween (0.3% Tween) and then blocked in the 
same buffer for 3 hrs. After washing, 2-fold serial dilutions of radiolabelled 
DNA were added to each well. The protein and DNA were incubated 
together for 2 hrs at room temp, and the wells were then washed 3 times in 

15 TBS-Tween. Bound DNA was detected by scintillation counting. All 
reactions were performed in duplicate. Figure 2 demonstrates that 
interaction between the zinc finger protein and its target DNA sequence 
may be assessed using this assay format. 

20 Antibody-based attachment of GST fusion protein to microtitre plates, 
followed by scintillation-based detection of radiolabelled DNA (Assay 
format 3) 

One pg of protein A was attached to the surface of each 
microtitre well in carbonate buffer, pH 9.2, for 18 hrs. The plates were 

25 washed three times in TBS-BSA (2% BSA) and then blocked in the same 
buffer for 3 hrs. Anti-GST antibody (1 pg) was added to each well in the 
same buffer and incubated at room temperature with rocking, for 1 hr. The 
plates were washed 3 times in TBS-BSA and then incubated for 1 hr with 4 
pmoles GST / GST ZF protein per well. After washing away unbound 

30 protein, the plates were incubated for 2 hrs at room temp with 2-fold serial 
dilutions of radiolabelled DNA. Unbound DNA was removed by 3 washes 
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in TBS-BSA. As negative controls, experiments were performed in the 
absence of antibody, to assess any binding of radiolabeled DNA by protein 
A. All reactions containing GST / GST ZF were performed in duplicate. 
Figure 3 demonstrates that interaction between the zinc finger protein and 
5 its target DNA sequence may be assessed using this assay format. 



Conclusion 

Three adsorption-based assay formats have been developed. 
All assay formats demonstrate interaction between the protein and its DNA 

io. target sequence, in each case, the protein is immobilised and the DNA is 
in solution. Labelled DNA is bound by the immobilised protein and then 
detected according to the nature of the label. Radiolabeled DNA is 
detected using scintillation-based methods or appropriate imaging 
technology. Non-radiometrically labelled DNA is detected using 

15 colorimetric techniques and a spectrophotometer. The assay formats are 
also applicable to fluorescently labelled DNA, where imaging technology 
would be used to detect the bound DNA. 
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CLAIMS 



5 1 . A set of libraries of genes which code for proteins which are 

capable of specific binding interactions with a specific binding partner by 
amino acid residues at at least two specified positions including a first 
specified position and at least one other specified position, which set of 
libraries consists of: 

10 a) 6 to 20 libraries in which each library has a triplet that codes 

for at least one but less than 20 amino acids at said first specified position, 
and is randomised at the or each triplet coding for the said at least one 
other specified position, the arrangement being such that interactions of the 
proteins coded for by the said 6 to 20 libraries with a specific binding 

15 partner identifies a triplet that codes for an amino acid at the said first 
specified position that takes part in the specific binding interaction, and 
b) 6 to 20 libraries in each of which libraries said first specified 

position is randomised and a different one of said at least one other 
specified positions has a triplet that codes for at least one but less than 20 

20 amino acids. 



2. The set of libraries of genes as claimed in claim 1 f which set 

of libraries consists of: 

a) 12 libraries in which each library has a triplet that codes for 
25 one or several but less than 20 amino acids at the said first determined 

position, the triplets being as shown in Table 1 or Table 2, and 

b) 12 libraries of corresponding design for each of the said one 
or more other determined positions. 

30 3. The set of libraries of genes as claimed in claim 1 or claim 2, 

wherein the genes code for zinc fingers. 
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4. The set of libraries of genes as claimed in claim 3, which set 
consists of 36 libraries in three groups of 12 libraries which code for amino 
acids at the -1 and +3 and +6 positions respectively. 

5. The set of libraries of genes as claimed in claim 3 or claim 4, 
wherein each gene codes for a protein comprising 3 zinc fingers. 

6. The set of libraries of genes as claimed in claim 5, wherein 
each gene codes for a protein having the sequence (SEQ ID NO: 2) 



TGEKPYKCPECGKSFSX-KSXLVXHQ RTH 
TGEKPYKCPECGKSFSXKSXLVXHQ RTH 
15 TGEKPYKCPECGKSFSXKSXLVXHQRTH. 

where X is any amino acid 

7, A set of libraries of proteins, which proteins are capable of 

20 specific binding interactions with a specified binding partner by amino acid 
residues at at least one specified position including a first specified position 
and at least one other specified position, which set of libraries consists of: 

a) 6 to 20 libraries in which each library has at least one but less 
than 20 amino acid residues at the said first specified position and is 

25 randomised at the said at least one other determined position, the 
arrangement being such that interaction of the 6 to 20 libraries with a 
specific binding partner identifies an amino acid residue at the said first 
specified position that takes part in the specific binding interaction, and 

b) 6 to 20 libraries in each of which libraries said first specified 
so position is randomised and a different amino acid is present at at least one 

other specified position. 
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8. The set of libraries of proteins as claimed in claim 7, which 
set of libraries consists of 

a ) 20 libraries in which each library has one specified amino acid 
residue at the said first determined position and is randomised at the said 

5 one or more other determined positions, and 

b) 20 libraries of corresponding design for each of the said one 
or more other determined positions. 

9. The set of libraries of proteins as claimed in claim 7 or claim 
io 8, wherein the proteins are zinc fingers. 

10. The set of libraries of proteins as claimed in claim 7, which 
set consists of 60 libraries in three groups of 20 libraries with specified 
amino acids at the -1 and +3 and +6 positions respectively. 

1 1 . The set of libraries of proteins as claimed in claim 9 or claim 
10, wherein each protein comprises three zinc fingers. 

12. The set of libraries of proteins as claimed in claim 1 1 , wherein 
20 each protein has the sequence (SEQ ID NO: 2) 

TGEKPYKCPECGKSFSXKSXLVXHQRTH 
TGEKPYKCPECGKSFSXKSXLVXHQRTH 

25 

TGEKPYKCPECGKSFSXKSXLVXHQRTH. 

where X is any amino acid 

3 0 1 3. a set of libraries of genes which code for the set of libraries of 

proteins defined in any one of claims 7 to 12. 



m 
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14. A method of identifying a protein which interacts with a 
specific binding partner, which method comprises providing a set of 
libraries of proteins as defined in any one of claims 7 to 12, incubating the 
specific binding partner with each library of the set, observing specific 

5 binding interactions with certain libraries of the set, and using the 

observations to identify a protein which interacts with the specific binding 
partner. 

15. The method as claimed in claim 14, wherein the specific 
10 binding partner is a polynucleotide. 

16. The method as claimed in claim 14, wherein the specific 
binding interactions are observed by radiometric or luminescent assay. 

is 17. The method as claimed in claim 14, wherein the specific 

binding interactions are observed by imaging means. 

18. The method as claimed in claim 14, wherein the specific 
binding interactions are observed by scintillation proximity assay. 

20 

19. The method as claimed in claim 18, wherein the sets of 
libraries of proteins are immobilised on scintillation proximity assay 
surfaces and the specific binding partner is radiolabeled. 

25 20. The method of claim 18 or claim 19, wherein after incubation 

the scintillation proximity assay surfaces are washed to distinguish stronger 
specific binding interactions from weaker ones. 
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21 . The method as claimed in claim 14, wherein the specific 
binding interactions are observed by colorimetric means. 

22. The method as claimed in claim 21 , wherein the specific 
5 binding partner is biqtinylated and the specific binding interaction is 

detected using a signal generating streptavidin conjugate. 

23. The method as claimed in claim 21 or claim 22 wherein after 
incubation the binding interactions are washed to distinguish stronger 

10 specific binding interactions from weaker ones. 

24. A protein having the sequence (SEQ ID NO: 1 ) 

TGEKPYKCPECGKSFS. K S * L V § HQRTH 

15 

TGEKPYKCPECGKSFS. K S * L V § HQRTH 

TGEKPYKCPECGKSFS. K S * L V $ HQRTH. 

20 25. A gene which codes for the protein of claim 24. 

26. A method of constructing randomised gene libraries in which 

the number of genes is the same as the number of encoded proteins and 
which contain no termination codons at the predetermined positions of 
25 randomisation, the method comprising the steps of: 

a) providing a template oligonucleotide which is fully randomised 
at predetermined codon positions; 

b) for each predetermined codon position providing a pool of 
selection oligonucleotides, wherein each member of said pool contains a 

so different codon selected from the group consisting of 



AMENDED SHEET 



GB 009903081 
-41- 



AAA, AAC, ACC, AGC, ATG, ATT, CAG, CAT, CCG, CGC, CTG, GAA, 
GAT, GCG, GGC, GTG, TAT, TGG, TGC, TTT. 

at the predetermined codon position; 
5 c ) selecting one or more selection oligonucleotides from each 

pool in order to encode the required gene or library; 

d) allowing the ligated selected oligonucleotides from each pool 
to hybridise with the template oligonucleotide; 

e) forming one or more constructs by Hgating the hybridised 
10 selection oligonucleotides together, 

f) removing a region from a gene of interest corresponding to 
the hybridised product from step e); 

g) forming a gene library or genes by ligating the products 
from step e) into the said gene of interest wherein the said gene of interest 

15 is contained within a suitable expression vector. 

27. A method of producing proteins encoded by the randomised 

gene libraries of claim 26 comprising the steps of: 

a) transforming a suitable host cell with the gene or gene 
20 library of claim 27 construct; 

b) expressing the genes to form proteins; 

c) purifying the proteins. 
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